Capturing subexpressions: at once stage we had an experimental product that (in block mode only, not streaming) could accurately model libpcre’s capturing semantics. This worked by scanning the data backwards with a backwards version of the pattern, making a trace of the states visited, and tracing forwards through the backward state trace in order to correctly model the behavior that libpcre would have done. This was possible since when we’re tracing forwards through the backwards trace and following 1-bits in our states, we know they will lead to an accept state.
Even more fun was implementing a version of this that checkpointed our backwards state traces periodically, allowing us to store O(sqrt(N)) states in our state trace, at the expense of having to scan twice as much (a generalization would have been to allow us to store O(k*pow(N, 1/k)) states with a cost of scanning k times as much.This had the feel of one of those tricky algorithm interview questions… overall, capturing support it was fun to work on, but no-one wanted it commercially. We probably wouldn’t do it this way today.
From the incredibly nerdy, but utterly delight blog post: https://branchfree.org/2019/02/28/paper-hyperscan-a-fast-multi-pattern-regex-matcher-for-modern-cpus/